{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 《强化学习：原理与Python实现》更新与勘误\n",
    "\n",
    "（2019年12月第1版第3次印刷）\n",
    "\n",
    "### 行数计算方法\n",
    "\n",
    "本勘误文档中，行数计算“第$i$行”（$i=0,1,2,\\ldots$）是从0开始计数的。小节标题、公式、内联代码、注意、本章要点记入行数，章标题、图、表、代码清单及它们的题注不计入行数。空行不计入行数。\n",
    "\n",
    "$\n",
    "\\newcommand{\\sfA}{\\unicode{x1d608}}\n",
    "\\newcommand{\\sfS}{\\unicode{x1d61a}}\n",
    "\\newcommand{\\sfa}{\\unicode{x1d622}}\n",
    "\\newcommand{\\sfs}{\\unicode{x1d634}}\n",
    "\\newcommand{\\bftheta}{\\pmb{\\unicode{x3B8}}}\n",
    "\\newcommand{\\E}{\\textrm{E}}\n",
    "$\n",
    "\n",
    "## 第20页第2行\n",
    "\n",
    "作者注：\n",
    "这种带折扣的回报定义既可以用于回合制任务，也可以用于连续性任务，是一种统一的表示。\n",
    "\n",
    "\n",
    "## 第20页第9行\n",
    "\n",
    "为$\\bar{R}=\\lim\\limits_{t\\to+\\infty}\\E\\left[\\frac{1}{t}\\sum\\limits_{\\tau=0}^{t}R_\\tau\\right]$\n",
    "\n",
    "### 改为\n",
    "\n",
    "为$\\bar{R}=\\lim\\limits_{t\\to+\\infty}\\E\\left[\\frac{1}{t}\\sum\\limits_{\\tau=1}^{t}R_\\tau\\right]$\n",
    "\n",
    "## 第28页第0行\n",
    "\n",
    "$\\Delta=\\left(1-\\gamma\\right)\\left(1-\\left(1-\\alpha{x}-\\beta{y}\\right)\\right)>0$\n",
    "\n",
    "### 改为\n",
    "\n",
    "$\\Delta=\\left(1-\\gamma\\right)\\left(1-\\left(1-\\alpha{x}-\\beta{y}\\right)\\gamma\\right)>0$\n",
    "\n",
    "\n",
    "## 第56页第10行\n",
    "\n",
    "同时满足$\\left\\{\\alpha_k:k=1,2,\\ldots\\right\\}$\n",
    "\n",
    "### 改为\n",
    "\n",
    "$\\left\\{\\alpha_k:k=1,2,\\ldots\\right\\}$同时满足\n",
    "\n",
    "\n",
    "## 第91页算法5-13第2.3.5步\n",
    "\n",
    "### 改为\n",
    "\n",
    "2.2.5（更新价值）$q\\left(\\sfs,\\sfa\\right)\\leftarrow{q}\\left(\\sfs,\\sfa\\right)+\\alpha{e}\\left(\\sfs,\\sfa\\right)\\left[U-q\\left(\\sfS,\\sfA\\right)\\right],\\sfs\\in\\mathcal{S},\\sfa\\in\\mathcal{A}\\left(\\sfs\\right)$;\n",
    "\n",
    "\n",
    "\n",
    "## 第92页算法5-14第2.3.5步\n",
    "\n",
    "### 改为\n",
    "\n",
    "2.2.5（更新价值）$v\\left(\\sfs\\right)\\leftarrow{v}\\left(\\sfs\\right)+\\alpha{e}\\left(\\sfs\\right)\\left[U-v\\left(\\sfS\\right)\\right],\\sfs\\in\\mathcal{S}$。\n",
    "\n",
    "\n",
    "## 第96页代码清单5-6第14行\n",
    "\n",
    "```\n",
    "        v = (self.q[next_state].sum() * self.epsilon + \\\n",
    "```\n",
    "\n",
    "### 改为\n",
    "\n",
    "```\n",
    "        v = (self.q[next_state].mean() * self.epsilon + \\\n",
    "```\n",
    "\n",
    "\n",
    "## 第118页\n",
    "\n",
    "作者注：\n",
    "\n",
    "砖瓦编码是一种历史悠久的特征构造方法，可用于回归、分类等问题。目前学术界倾向于用神经网络代替砖瓦编码来构造特征。由于砖瓦编码和强化学习没有直接关联，本书没有用过多的篇幅介绍砖瓦编码。\n",
    "\n",
    "实际使用砖瓦编码时，不需要精确计算砖瓦的数量，常随意的大致估计砖瓦的数量作为特征数。如果设置的特征数大于真实的砖瓦数量，那么有些特征永远不会取到，有些浪费；如果设置的特征数小于真实的砖瓦数量，那么有多个砖瓦需要共享特征，具体逻辑可以见代码清单6-3中“冲突处理”部分。这些浪费和冲突往往不会造成明显的性能损失。\n",
    "\n",
    "第118页砖瓦数计算：每个大网格的网格宽度刚好是整个取值范围的1/8。所以，第0层大网格每个纬度有8个大网格；第1~7层由于有偏移，每个纬度需要有9个大网格才能覆盖整个取值范围。第117页图6-3b的情况略有不同：这个图中每个纬度的取值范围不是大网格的长度的整数倍。所以有些层偏移后，不需要更多的大网格也可以覆盖整个取值范围。\n",
    "\n",
    "\n",
    "## 第144页第7~11行\n",
    "\n",
    "$\\E_{\\pi\\left(\\bftheta\\right)}\\left[\\sum\\limits_{t=0}^{+\\infty}{\\gamma^ta_{\\pi_k}\\left(\\sfS_t,\\sfA_t\\right)} \\right]$\n",
    "\n",
    "$=\\E_{\\pi\\left(\\bftheta\\right)}\\left[\\sum\\limits_{t=0}^{+\\infty}{\\gamma^t\\left(R_t+\\gamma{v_{\\pi\\left(\\bftheta_k\\right)}}\\left(\\sfS_{t+1}\\right)-v_{\\pi\\left(\\bftheta_k\\right)}\\left(\\sfS_t\\right)\\right)}\\right]$\n",
    "\n",
    "$=\\E_{\\pi\\left(\\bftheta\\right)}\\left[-v_{\\pi\\left(\\bftheta_k\\right)}\\left(\\sfS_0\\right)+\\sum\\limits_{t=0}^{+\\infty}{{\\gamma^t}{R_t}}\\right]$\n",
    "\n",
    "$=-\\E_{\\sfS_0}\\left[v_{\\pi\\left(\\bftheta_k\\right)}\\left(\\sfS_0\\right)\\right]+\\E_{\\pi\\left(\\bftheta\\right)}\\left[\\sum\\limits_{t=0}^{+\\infty}{\\gamma^tR_t}\\right]$\n",
    "\n",
    "$=-\\E_{\\pi\\left(\\bftheta_k\\right)}\\left[G_0\\right]+\\E_{\\pi\\left(\\bftheta\\right)}\\left[G_0\\right].$\n",
    "\n",
    "### 改为\n",
    "\n",
    "$\\E_{\\pi\\left(\\bftheta\\right)}\\left[\\sum\\limits_{t=0}^{+\\infty}{\\gamma^{t}a_{\\pi\\left(\\bftheta_k\\right)}\\left(\\sfS_t,\\sfA_t\\right)}\\right]$\n",
    "\n",
    "$=\\E_{\\pi\\left(\\bftheta\\right)}\\left[\\sum\\limits_{t=0}^{+\\infty}{\\gamma^t\\left(R_{t+1}+\\gamma{v_{\\pi\\left(\\bftheta_k\\right)}}\\left(\\sfS_{t+1}\\right)-v_{\\pi\\left(\\bftheta_k\\right)}\\left(\\sfS_t\\right)\\right)}\\right]$\n",
    "\n",
    "$=\\E_{\\pi\\left(\\bftheta\\right)}\\left[-v_{\\pi\\left(\\bftheta_k\\right)}\\left(\\sfS_0\\right)+\\sum\\limits_{t=0}^{+\\infty}{\\gamma^tR_{t+1}}\\right]$\n",
    "\n",
    "$=-\\E_{\\sfS_0}\\left[v_{\\pi\\left(\\bftheta_k\\right)}\\left(\\sfS_0\\right)\\right]+\\E_{\\pi\\left(\\bftheta\\right)}\\left[\\sum\\limits_{t=0}^{+\\infty}{\\gamma^tR_{t+1}}\\right]$\n",
    "\n",
    "$=-\\E_{\\pi\\left(\\bftheta_k\\right)}\\left[G_0\\right]+\\E_{\\pi\\left(\\bftheta\\right)}\\left[G_0\\right].$\n",
    "\n",
    "\n",
    "## 第146页算法8-5第2.3步\n",
    "\n",
    "更新$\\bftheta$以减小\n",
    "\n",
    "### 改为\n",
    "\n",
    "更新$\\bftheta$以增大\n",
    "\n",
    "\n",
    "## 第150页第9行\n",
    "\n",
    "$\\frac{\\partial}{\\partial\\alpha_k}\\left(\\frac{1}{2}{{\\left(\\mathbf{x}_k+\\alpha_k\\mathbf{p}_k\\right)}^\\mathrm{T}}\\mathbf{F}\\left(\\mathbf{x}_k+\\alpha_k\\mathbf{p}_k\\right)-\\mathbf{g}^{\\mathrm{T}}\\left(\\mathbf{x}_k+\\alpha\\mathbf{p}_k\\right)\\right)=\\alpha_k\\mathbf{p}_k^\\mathrm{T}\\mathbf{F}\\mathbf{p}_k-\\mathbf{p}_k^\\mathrm{T}\\left(\\mathbf{F}\\mathbf{x}_k-\\mathbf{g}\\right)$\n",
    "\n",
    "### 改为\n",
    "\n",
    "$\\frac{\\partial}{\\partial\\alpha_k}\\left(\\frac{1}{2}{{\\left(\\mathbf{x}_k+\\alpha_k\\mathbf{p}_k\\right)}^\\mathrm{T}}\\mathbf{F}\\left(\\mathbf{x}_k+\\alpha_k\\mathbf{p}_k\\right)-\\mathbf{g}^{\\mathrm{T}}\\left(\\mathbf{x}_k+\\alpha_k\\mathbf{p}_k\\right)\\right)=\\alpha_k\\mathbf{p}_k^\\mathrm{T}\\mathbf{F}\\mathbf{p}_k+\\mathbf{p}_k^\\mathrm{T}\\left(\\mathbf{F}\\mathbf{x}_k-\\mathbf{g}\\right)$\n",
    "\n",
    "\n",
    "## 第150页第11行\n",
    "\n",
    "$\\alpha_k=\\frac{\\mathbf{p}_k^\\mathrm{T}\\left(\\mathbf{F}\\mathbf{x}_k-\\mathbf{g}\\right)}{\\mathbf{p}_k^\\mathrm{T}\\mathbf{F}\\mathbf{p}_k}$\n",
    "\n",
    "### 改为\n",
    "\n",
    "$\\alpha_k=\\frac{\\mathbf{p}_k^\\mathrm{T}\\left(\\mathbf{g}-\\mathbf{F}\\mathbf{x}_k\\right)}{\\mathbf{p}_k^\\mathrm{T}\\mathbf{F}\\mathbf{p}_k}$\n",
    "\n",
    "\n",
    "## 第155页第28行\n",
    "\n",
    "$=-\\sum\\limits_\\sfa\\frac{\\pi\\left(\\sfa|\\sfS_t;\\bftheta^\\mathrm{EMA}\\right)}{\\pi\\left(\\sfa|{\\sfS_t};\\bftheta\\right)}.$\n",
    "\n",
    "### 改为\n",
    "\n",
    "$=-\\sum\\limits_\\sfa\\frac{\\pi\\left(\\sfa\\mid\\sfS_t;\\bftheta^\\mathrm{EMA}\\right)}{\\pi\\left(\\sfa\\mid\\sfS_t;\\bftheta\\right)} {\\nabla_{\\bftheta}} \\pi\\left(\\sfa\\mid\\sfS_t;\\bftheta\\right).$\n",
    "\n",
    "\n",
    "## 第173页第20行\n",
    "\n",
    "$=\\E\\left[\\nabla\\pi\\left(\\sfS_0;\\bftheta\\right){\\left[\\nabla_\\sfa q_{\\pi\\left(\\bftheta\\right)}\\left(\\sfS_0,\\sfa\\right)\\right]}_{\\sfa=\\pi\\left(\\sfS_0;\\bftheta\\right)}\\right]+\\gamma\\E\\left[\\nabla\\pi\\left(\\sfS_1;\\bftheta\\right){{\\left[{\\nabla_\\sfa}{q_{\\pi\\left(\\bftheta\\right)}}\\left(\\sfS_1,\\sfa\\right)\\right]}_{\\sfa=\\pi\\left(\\sfS_1;\\bftheta\\right)}}\\right]+\\gamma^2\\E\\left[\\nabla{v}_{\\pi\\left(\\bftheta\\right)}\\left(\\sfS_1\\right)\\right]$\n",
    "\n",
    "### 改为\n",
    "\n",
    "$=\\E\\left[\\nabla\\pi\\left(\\sfS_0;\\bftheta\\right){\\left[{\\nabla_\\sfa}q_{\\pi\\left(\\bftheta\\right)}\\left(\\sfS_0,\\sfa\\right)\\right]}_{\\sfa=\\pi\\left(\\sfS_0;\\bftheta\\right)}\\right]+\\gamma\\E\\left[\\nabla\\pi\\left(\\sfS_1;\\bftheta\\right){{\\left[{\\nabla_\\sfa}{q_{\\pi\\left(\\bftheta\\right)}}\\left(\\sfS_1,\\sfa\\right)\\right]}_{\\sfa=\\pi\\left(\\sfS_1;\\bftheta\\right)}}\\right]+\\gamma^2\\E\\left[\\nabla{v}_{\\pi\\left(\\bftheta\\right)}\\left(\\sfS_2\\right)\\right]$\n",
    "\n",
    "\n",
    "## 第180页图9-1\n",
    "\n",
    "### 改为\n",
    "\n",
    "<img src=\"./images/figure09_01.png\" style=\"width: 200px;\"/>\n",
    "\n",
    "\n",
    "## 第180页第4行\n",
    "\n",
    "$X$轴是水平向下的，$Y$轴是水平向右的。\n",
    "\n",
    "### 改为\n",
    "\n",
    "$X$轴是水平向上的，$Y$轴是水平向左的。\n",
    "\n",
    "\n",
    "## 第207页图11-3\n",
    "\n",
    "### 改为\n",
    "\n",
    "<img src=\"./images/figure11_03.png\" style=\"width: 400px;\"/>\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
